Welcome, friends!

Please go forth and scroll down to begin your journey though my data re-analysis assignment for AN 597. Here I have provided some basic background and performed/replicated all analyses in a submitted manuscript, Warkentin et al. 201-tbd. The original paper and full dataset used are uploaded onto the repository in pdf and csv form, respectively.

Introduction

Red-eyed treefrogs, Agalychnis callidryas (pictured above as an adult and a juvenile), commonly lay their eggs (pictured below left) on leafy substrate overhanging tropical ponds .

These arboreal embryos can hatch prematurely to escape from egg predators, such as the parrot snake (pictured above right), cued by vibrations in attacks.

It is known that young embryos modulate hatching based on multiple frequency and temporal properties of cues, reducing false alarms that unnecessarily expose them to risk in the water.

Hypothesis

Because the cost of false alarms decreases developmentally we hypothesized that, if sampling costs are high or stimuli ambiguous, older embryos would accept more false alarms.

Methods

Experimental design

We assessed changes in sensitivity to sampling costs using vibration playbacks (to trays of embryos, pictured above) at two developmental stages. We designed sets of 3 stimuli, based on prior results with younger embryos, so one elicited high hatching and two elicited similarly low hatching, but sampling costs differed between low-hatching stimuli.

Statistics

The original paper and full dataset used are uploaded onto the repository in pdf and csv form, respectively. I will replicate all analyses included in the paper, including:
* mann-whitney-wilcoxon tests
* two-sample t-tests
* wilcoxon rank sum tests
* binomial GLMs
* interaction plots
* ANOVAs
* AIC model comparisons

Code preparation

File logistics

First, we will clear the environment and set our working directory:

rm(list=ls()) #clear environment
setwd('/Users/juliejung/Documents/GitHub/AN 597/data-reanalysis-assignment') #set working directory       

User Defined Functions

Here we’ll define some functions that will help us later. The following function will give us the mode:

# gives mode
Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

Load packages

Here, we install and load packages that we’ll need later on in the code. NOTE: uncomment as needed.

# install.packages("stargazer")
# install.packages("knitr")
# install.packages("dplyr")
# install.packages("curl")
# install.packages("sciplot")
# install.packages("ggplot2")
# install.packages("MASS")
#install.packages("multcomp")
# #install.packages("AICcmodavg", dependencies = TRUE)
# #install.packages("VGAM")
# install.packages("car")

library("stargazer")
## 
## Please cite as:
##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.1. https://CRAN.R-project.org/package=stargazer
library("knitr")
library("dplyr")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library("curl")
library("sciplot")
library("ggplot2")
library("MASS")
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
library("multcomp")
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## 
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
## 
##     geyser
#library("VGAM") #not available for R version 3.3.3
#library("AICcmodavg") #not available for R version 3.3.3
library("car")
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode

Read in data

Next, we want to read in the relevant datafile (from the data-reanalysis-assignment repo) and show a few lines of raw data in your output (e.g., using head()).

f <- curl("https://raw.githubusercontent.com/jamjulie/data-reanalysis-assignment/master/WarkentinJungRuedaMcDaniel-DataForDeposit.csv")
data <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = TRUE)
head(data)
##       Date Age.d. Set TestOrder STIMULUS HFvLS GroupedComparisonYounger
## 1 20151013    5.2   1         1       LS    LS                   Paired
## 2 20151013    5.2   1         2       LF  <NA>                   Single
## 3 20151013    5.2   1         3       HF    HF                   Paired
## 4 20151013    5.2   2         4       HF    HF                   Paired
## 5 20151013    5.2   2         5       LS    LS                   Paired
## 6 20151013    5.2   2         6       LF  <NA>                   Single
##   LFvLS GroupedComparisonOlder SetupTime.h.  Tray AlreadyHatchedorRuptured
## 1    LS                 Paired         3.02 390-2                        1
## 2    LF                 Paired         3.35   390                        3
## 3  <NA>                 Single         3.75   370                        2
## 4  <NA>                 Single         4.08 388-2                        1
## 5    LS                 Paired         4.45   388                        0
## 6    LF                 Paired         4.82 383-2                        3
##   HatchedinSetup LessDeveloped TestEggs FirstHatch.s. FirstHatch.s.600
## 1              1             0       13            90               90
## 2              0             0       12            79               79
## 3              0             0       13            NA              600
## 4              2             0       12            NA              600
## 5              0             0       15            67               67
## 6              0             1       11            24               24
##   CycleLength.s. FirstHatch.cycles. FirstHatch.cycles.600 X5minHatch
## 1             17           5.294118              5.294118          2
## 2              2          39.500000             39.500000          2
## 3              2                 NA            300.000000          0
## 4              2                 NA            300.000000          0
## 5             17           3.941176              3.941176          3
## 6              2          12.000000             12.000000          1
##   X10minHatch ProportionHatched GutCoilStage T1length T2length T3length
## 1           2             0.154           NA   10.793    8.124   11.124
## 2           2             0.167           NA   10.514   10.698   11.134
## 3           0             0.000           NA   10.489   10.527       11
## 4           0             0.000            0   10.244   10.768   10.644
## 5           3             0.200            0   10.655   10.346   11.501
## 6           2             0.182            1   10.856   11.158   10.323
##   MeanLength
## 1     10.014
## 2     10.782
## 3     10.672
## 4     10.552
## 5     10.834
## 6     10.779

Results, part I

Now that we’ve gone through all our preparations, we can run analyses and report replicated results! The following results follow the order of presented results in the paper (pdf in repo).

First, we will check there are at least 8 eggs per tray.

min(data$TestEggs, na.rm=T) 
## [1] 8

Now let’s look at the structure of our data, and make sure the variables are defined as we want them to be defined.

str(data)
## 'data.frame':    81 obs. of  28 variables:
##  $ Date                    : int  20151013 20151013 20151013 20151013 20151013 20151013 20151013 20151013 20151013 20151013 ...
##  $ Age.d.                  : num  5.2 5.2 5.2 5.2 5.2 5.2 5.2 5.2 5.2 5.2 ...
##  $ Set                     : int  1 1 1 2 2 2 3 3 3 4 ...
##  $ TestOrder               : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ STIMULUS                : Factor w/ 3 levels "HF","LF","LS": 3 2 1 1 3 2 3 1 2 1 ...
##  $ HFvLS                   : Factor w/ 2 levels "HF","LS": 2 NA 1 1 2 NA 2 1 NA 1 ...
##  $ GroupedComparisonYounger: Factor w/ 2 levels "Paired","Single": 1 2 1 1 1 2 1 1 2 1 ...
##  $ LFvLS                   : Factor w/ 2 levels "LF","LS": 2 1 NA NA 2 1 2 NA 1 NA ...
##  $ GroupedComparisonOlder  : Factor w/ 2 levels "Paired","Single": 1 1 2 2 1 1 1 2 1 2 ...
##  $ SetupTime.h.            : num  3.02 3.35 3.75 4.08 4.45 4.82 5.27 5.77 6.08 6.52 ...
##  $ Tray                    : Factor w/ 81 levels "370","371-2",..: 27 26 1 24 23 18 17 25 22 19 ...
##  $ AlreadyHatchedorRuptured: int  1 3 2 1 0 3 1 1 0 0 ...
##  $ HatchedinSetup          : int  1 0 0 2 0 0 0 0 0 1 ...
##  $ LessDeveloped           : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ TestEggs                : int  13 12 13 12 15 11 14 14 15 14 ...
##  $ FirstHatch.s.           : int  90 79 NA NA 67 24 NA 58 34 NA ...
##  $ FirstHatch.s.600        : int  90 79 600 600 67 24 600 58 34 600 ...
##  $ CycleLength.s.          : int  17 2 2 2 17 2 17 2 2 2 ...
##  $ FirstHatch.cycles.      : num  5.29 39.5 NA NA 3.94 ...
##  $ FirstHatch.cycles.600   : num  5.29 39.5 300 300 3.94 ...
##  $ X5minHatch              : int  2 2 0 0 3 1 0 2 5 0 ...
##  $ X10minHatch             : int  2 2 0 0 3 2 1 2 5 0 ...
##  $ ProportionHatched       : num  0.154 0.167 0 0 0.2 0.182 0.071 0.143 0.333 0 ...
##  $ GutCoilStage            : int  NA NA NA 0 0 1 1 0 0 1 ...
##  $ T1length                : num  10.8 10.5 10.5 10.2 10.7 ...
##  $ T2length                : num  8.12 10.7 10.53 10.77 10.35 ...
##  $ T3length                : Factor w/ 80 levels "10.323","10.368",..: 21 22 16 7 41 1 10 13 2 20 ...
##  $ MeanLength              : num  10 10.8 10.7 10.6 10.8 ...
data$GutCoilStage<-as.numeric(as.character(data$GutCoilStage))
data$T3length<-as.numeric(as.character(data$T3length))
data$ProportionHatched<-as.numeric(as.character(data$ProportionHatched))
data$STIMULUS<-as.factor(data$STIMULUS)
data$FirstHatch.s. <-as.numeric(as.character(data$FirstHatch.s. ))
data$FirstHatch.cycles. <-as.numeric(as.character(data$FirstHatch.cycles. ))

Here let’s subset the data into 2 age groups: younger and older.

younger<-subset(data, Age.d.==5.2, na.rm=T)
older<-subset(data, Age.d.==5.7, na.rm=T)

Since the paper reports the mode of stages for each age category, let’s find that:

Mode(younger$GutCoilStage)
## [1] 1
Mode(older$GutCoilStage)
## [1] 3

Younger embryos showed a range of stages from intact yolks to curved furrows while older embryos had straight furrows to full coils (N = 39 and 42; modes: straight furrow(1) and full coil(3)).

Summarize the data

Here we’ll create a table summary of statistics for hatching rates (in proportion hatched) for each age category, grouped by stimulus treatment. We use the {dplyr} package to summarize our data.

younger_errorstats_g8<-
  younger %>%
  group_by(STIMULUS) %>%
  summarize(count = n(),
            mean = mean(ProportionHatched),
            SD = sd(ProportionHatched), 
            SE = sd(ProportionHatched)/sqrt(n())
            )
younger_errorstats_g8$AgeGroup <- 5.2
kable(younger_errorstats_g8,title="Mean & SD & SE",digits=3)
STIMULUS count mean SD SE AgeGroup
HF 13 0.061 0.056 0.016 5.2
LF 13 0.376 0.307 0.085 5.2
LS 13 0.099 0.085 0.024 5.2
older_errorstats_g8<-
  older %>%
  group_by(STIMULUS) %>%
  summarize(count = n(),
            mean = mean(ProportionHatched),
            SD = sd(ProportionHatched), 
            SE = sd(ProportionHatched)/sqrt(n())
            )
older_errorstats_g8$AgeGroup <- 5.7
kable(older_errorstats_g8,title="Mean & SD & SE",digits=3)
STIMULUS count mean SD SE AgeGroup
HF 14 0.211 0.211 0.056 5.7
LF 14 0.669 0.232 0.062 5.7
LS 14 0.582 0.371 0.099 5.7

Should we use parametric or non-parametric tests?

hist(data$GutCoilStage) #non parametric

hist(younger$GutCoilStage)

hist(older$GutCoilStage)

We’ll use mann-whitney-wilcoxon test because our 2 data samples (younger and older) are independent/come from distinct populations, so the samples do not affect each other.

Also because our data are non-parametric.

wtest<-wilcox.test(data$GutCoilStage~data$Age.d., mu = 0, alt="two.sided", paired = F, conf.int=T, conf.level=0.95)
qnorm(wtest$p.value) #z-value to report
## [1] -6.606223
wtest
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  data$GutCoilStage by data$Age.d.
## W = 88.5, p-value = 1.971e-11
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -1.999996 -1.000003
## sample estimates:
## difference in location 
##              -1.999985

Younger embryos showed a range of stages from intact yolks to curved furrows while older embryos had straight furrows to full coils (N = 39 and 42; modes: straight furrow(1) and full coil(3); Wilcoxon test, Z=6.606223, P =1.97e-11).

Moving right along, here we’ll calculation the total lengths (mean and se) of hatchlings in each age group.

mean(younger$MeanLength)
## [1] 11.06536
mean(older$MeanLength)
## [1] 11.64307
se(younger$MeanLength)
## [1] 0.07123206
se(older$MeanLength)
## [1] 0.0546835

Embryos grew between test ages. Hatchling total length increased from 11.1 ± 0.07 to 11.6 ± 0.05 mm (mean ± SE, N = 39 and 42 trays respectively).

Since the data are normal and we get equal variances, we’ll perform a t-test.

hist(data$MeanLength)#normal

var.test(younger$MeanLength, older$MeanLength) # equal variances
## 
##  F test to compare two variances
## 
## data:  younger$MeanLength and older$MeanLength
## F = 1.5756, num df = 38, denom df = 41, p-value = 0.1554
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.8396433 2.9795488
## sample estimates:
## ratio of variances 
##           1.575628
t.test(younger$MeanLength, older$MeanLength, alternative=c("two.sided"), paired=FALSE, var.equal=TRUE, conf.level=0.95)
## 
##  Two Sample t-test
## 
## data:  younger$MeanLength and older$MeanLength
## t = -6.4874, df = 79, p-value = 7.01e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.7549655 -0.4004594
## sample estimates:
## mean of x mean of y 
##  11.06536  11.64307

Embryos grew between test ages. Hatchling total length increased from 11.1 ± 0.07 to 11.6 ± 0.05 mm (mean ± SE, N = 39 and 42 trays respectively; t-test, t79 = 6.4871, P =7.018e-9; Fig. 4B).

Embryos also hatched spontaneously between test ages, so we want to calculate some means and SEs for spontaneous hatching before the test period.

mean(younger$AlreadyHatchedorRuptured)
## [1] 0.5128205
se(younger$AlreadyHatchedorRuptured)
## [1] 0.1317985
mean(older$AlreadyHatchedorRuptured)
## [1] 0.9047619
se(older$AlreadyHatchedorRuptured)
## [1] 0.1219733

From trays tested, more of the older embryos had hatched spontaneously before the test period (0.9 ± 0.1 vs. 0.5 ± 0.1 embryos per tray).

Since the data are non-parametric, we’ll perform a mann-whitney test:

hist(data$AlreadyHatchedorRuptured)#non-parametric, mann-whitney

wtest<-wilcox.test(data$AlreadyHatchedorRuptured~data$Age.d., mu = 0, alt="two.sided", paired = F, conf.int=T, conf.level=0.95)
## Warning in wilcox.test.default(x = c(1L, 3L, 2L, 1L, 0L, 3L, 1L, 1L, 0L, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = c(1L, 3L, 2L, 1L, 0L, 3L, 1L, 1L, 0L, :
## cannot compute exact confidence intervals with ties
qnorm(wtest$p.value) #z-value to report
## [1] -2.439028
wtest
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  data$AlreadyHatchedorRuptured by data$Age.d.
## W = 559.5, p-value = 0.007363
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -9.999772e-01 -6.210509e-05
## sample estimates:
## difference in location 
##          -2.881878e-05

From trays tested, more of the older embryos had hatched spontaneously before the test period (0.9 ± 0.1 vs. 0.5 ± 0.1 embryos per tray; Wilcoxon test, Z=2.439, P = 0.0074) and relatively fewer of the trays with older eggs had sufficient individuals to attempt setup (KMW unquantified personal observation).

Now we’ll look at the number of embryos that hatched at set up:

# hatching from set up
mean(younger$HatchedinSetup)
## [1] 0.8974359
se(younger$HatchedinSetup)
## [1] 0.2317418
mean(older$HatchedinSetup)
## [1] 2.452381
se(older$HatchedinSetup)
## [1] 0.2974971

More of the older embryos hatched during setup and acclimation (2.5 ± 0.3 vs. 0.9 ± 0.2 per tray).

Since the data are non-parametric, we’ll perform a mann-whitney test:

hist(data$HatchedinSetup)#non-parametric, mann-whitney

wtest<-wilcox.test(data$HatchedinSetup~data$Age.d., mu = 0, alt="two.sided", paired = F, conf.int=T, conf.level=0.95)
## Warning in wilcox.test.default(x = c(1L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = c(1L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, :
## cannot compute exact confidence intervals with ties
qnorm(wtest$p.value) #z-value to report
## [1] -3.886517
wtest
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  data$HatchedinSetup by data$Age.d.
## W = 404.5, p-value = 5.085e-05
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -2.0000029 -0.9999489
## sample estimates:
## difference in location 
##              -1.000005

More of the older embryos hatched during setup and acclimation (2.5 ± 0.3 vs. 0.9 ± 0.2 per tray; Wilcoxon test, Z=404.5, P = 5.085e-5).

We want to calculate some means and ses for the number of test eggs per tray in older vs. younger.

#smaller number of test eggs per tray in older than younger
mean(younger$TestEggs)
## [1] 13.46154
se(younger$TestEggs)
## [1] 0.2404622
mean(older$TestEggs)
## [1] 11.45238
se(older$TestEggs)
## [1] 0.3089873
min(younger$TestEggs)
## [1] 8
max(younger$TestEggs)
## [1] 15
min(older$TestEggs)
## [1] 8
max(older$TestEggs)
## [1] 15

This resulted in smaller numbers of test eggs per tray (older 11.5 ± 0.3, younger 13.5 ± 0.2, range 8–15 at both ages).

Since the data are non-parametric, we’ll perform a mann-whitney test:

hist(data$TestEggs)#non-parametric, mann-whitney

wtest<-wilcox.test(data$TestEggs~data$Age.d., mu = 0, alt="two.sided", paired = F, conf.int=T, conf.level=0.95)
## Warning in wilcox.test.default(x = c(13L, 12L, 13L, 12L, 15L, 11L, 14L, :
## cannot compute exact p-value with ties
## Warning in wilcox.test.default(x = c(13L, 12L, 13L, 12L, 15L, 11L, 14L, :
## cannot compute exact confidence intervals with ties
qnorm(wtest$p.value) #z-value to report
## [1] -4.361872
wtest
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  data$TestEggs by data$Age.d.
## W = 1289.5, p-value = 6.448e-06
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  1.000044 2.999979
## sample estimates:
## difference in location 
##               2.000011

This resulted in smaller numbers of test eggs per tray (Wilcoxon test, Z=4.361872, P =6.448e-6; older 11.5 ± 0.3, younger 13.5 ± 0.2, range 8–15 at both ages).

Results, part II: Visualizations

Next, we want to know whether age, stimulus, and/or their interaction affected the hatching response of embryos in playbacks

We’ll do this by performing a binomial GLM:

#binomial glm
glm1<-glm(cbind(X10minHatch,TestEggs)~Age.d., family=binomial(logit), data=data)
glm2<-glm(cbind(X10minHatch,TestEggs)~STIMULUS, family=binomial(logit), data=data)
glm3<-glm(cbind(X10minHatch,TestEggs)~Age.d.+STIMULUS, family=binomial(logit), data=data)
glm4<-glm(cbind(X10minHatch,TestEggs)~Age.d.*STIMULUS, family=binomial(logit), data=data)

glms<-list(glm1, glm2, glm3, glm4)
#aictab(glms)

Anova(glm4)
## Analysis of Deviance Table (Type II tests)
## 
## Response: cbind(X10minHatch, TestEggs)
##                 LR Chisq Df Pr(>Chisq)    
## Age.d.            59.578  1  1.175e-14 ***
## STIMULUS          57.863  2  2.724e-13 ***
## Age.d.:STIMULUS   11.934  2   0.002562 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(glm4)
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: cbind(X10minHatch, TestEggs)
## 
## Terms added sequentially (first to last)
## 
## 
##                 Df Deviance Resid. Df Resid. Dev
## NULL                               80     261.88
## Age.d.           1   61.250        79     200.63
## STIMULUS         2   57.863        77     142.76
## Age.d.:STIMULUS  2   11.934        75     130.83

Age, stimulus, and their interaction affected the hatching response of embryos in playbacks (Fig. 6A). Older embryos hatched more than did younger ones (binomial GLM, age effect: χ2 = 107.71, df = 1, P < 0.0001), and the LF stimulus elicited the strongest hatching response (stimulus effect: χ2 = 57.57, df = 2, P < 0.0001). However, the significant age × stimulus interaction revealed that the developmental increase in hatching was not uniform across stimuli (interaction effect: χ2 = 12.55, df = 2, P = 0.0019).

This is the plot for our proportion hatched data (Figure 6 in paper)

interaction.plot(data$Age.d., data$STIMULUS, data$ProportionHatched, 
                 leg.bty="0", pch=c(18,24), 
                 ylab="Proportion hatched", 
                 xlab="Age (days)", 
                 trace.label="")

The image of the figure from the original paper that I’m replicating is included in a folder called “img” within my repo and embedded here:

Specifically, we’re looking at part A of the figure.

ALTERNATIVELY, we can use ggplot to create a much prettier plot, with standard errors.

younger_errorstats_g8$AgeGroup<-as.factor(younger_errorstats_g8$AgeGroup)
older_errorstats_g8$AgeGroup<-as.factor(older_errorstats_g8$AgeGroup)
combined_errorstats<- rbind(younger_errorstats_g8, older_errorstats_g8)

The following code creates our figure:

ggplot(data=combined_errorstats, aes(x=AgeGroup, y=mean, colour=STIMULUS)) + 
  geom_point(size=3) +
  geom_errorbar(data=combined_errorstats, aes(ymin=mean-SE, ymax=mean+SE), width=0.05)+
  theme_bw(20)+
  ylab("Proportion of tray hatched\n")+
  xlab("\n Age (d)")+
  annotate("text", x = 1, y = 0.2, label = "N=13")+
  annotate("text", x = 2, y = 0.4, label = "N=14")

Moving on: Here we perform separate binomial GLMs at each age, with grouped categories of treatment stimuli.

#binomial glm for YOUNGER
glm1<-glm(cbind(X10minHatch,TestEggs)~STIMULUS, family=binomial(logit), data=younger)
Anova(glm1) # stimulus effect
## Analysis of Deviance Table (Type II tests)
## 
## Response: cbind(X10minHatch, TestEggs)
##          LR Chisq Df Pr(>Chisq)    
## STIMULUS   40.133  2  1.929e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# HF-LS contrast
glm2<-glm(cbind(X10minHatch,TestEggs)~HFvLS, family=binomial(logit), data=younger)
Anova(glm2) 
## Analysis of Deviance Table (Type II tests)
## 
## Response: cbind(X10minHatch, TestEggs)
##       LR Chisq Df Pr(>Chisq)
## HFvLS   1.2655  1     0.2606
# (HF+LS)–LF contrast
glm3<-glm(cbind(X10minHatch,TestEggs)~GroupedComparisonYounger, family=binomial(logit), data=younger)
Anova(glm3) 
## Analysis of Deviance Table (Type II tests)
## 
## Response: cbind(X10minHatch, TestEggs)
##                          LR Chisq Df Pr(>Chisq)    
## GroupedComparisonYounger   38.867  1  4.536e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Younger embryos showed equally little response to both the HF and LS stimuli but a substantial hatching response to the LF stimulus (Fig. 6A, stimulus effect: χ2 = 40.133, df = 2, P < 0.929e-09; HF–LS contrast χ2 = 1.2655, df=1, P = 0.2606; (HF+LS)–LF contrast χ2 = 38.867, df=1, P = 4.536e-10).

#binomial glm for OLDER
glm4<-glm(cbind(X10minHatch,TestEggs)~STIMULUS, family=binomial(logit), data=older)
Anova(glm4) # stimulus effect
## Analysis of Deviance Table (Type II tests)
## 
## Response: cbind(X10minHatch, TestEggs)
##          LR Chisq Df Pr(>Chisq)    
## STIMULUS   29.664  2  3.618e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# LF–LS contrast
glm5<-glm(cbind(X10minHatch,TestEggs)~LFvLS, family=binomial(logit), data=older)
Anova(glm5) 
## Analysis of Deviance Table (Type II tests)
## 
## Response: cbind(X10minHatch, TestEggs)
##       LR Chisq Df Pr(>Chisq)
## LFvLS  0.60737  1     0.4358
# (LF+LS)–HF contrast
glm6<-glm(cbind(X10minHatch,TestEggs)~GroupedComparisonOlder, family=binomial(logit), data=older)
Anova(glm6) 
## Analysis of Deviance Table (Type II tests)
## 
## Response: cbind(X10minHatch, TestEggs)
##                        LR Chisq Df Pr(>Chisq)    
## GroupedComparisonOlder   29.057  1  7.028e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In contrast, older embryos showed similarly strong hatching responses to both LF and LS stimuli and a weaker response to the HF stimulus (stimulus effect: χ2 = 29.664, df = 2, P =3.618e-7; LF–LS contrast χ2 = 0.4358, df=1, P = 0.4358; (LF+LS)–HF contrast χ2 = 29.057, df=1, P=7.028e-8).

Results, part III: Latency Analysis

Because risk in predator attacks probably accrues as a function of time, but information from temporal properties accrues as a function of cycles we conducted analyses of latency measured both in time (seconds) and in cycles (dividing time by the cycle length of the stimulus).

min(data$FirstHatch.s., na.rm=T)
## [1] 9
max(data$FirstHatch.s., na.rm=T)
## [1] 420

In trays where hatching occurred, the latency until the first embryo hatched ranged from 9–420 s.

Are our data parametric?

hist(data$FirstHatch.s.) #nonparametric

hist(data$FirstHatch.cycles.) #nonparametric

hist(log(data$FirstHatch.s.)) #parametric!

hist(log(data$FirstHatch.cycles.)) #parametric!

Since they are, we can use ANOVAs of log-transformed data to test for effects of age, stimulus and age-by-stimulus interaction on the latency to hatch.

# IN SECONDS, Anova
aov1 <- aov(log(FirstHatch.s.) ~ Age.d.*STIMULUS, data=data)
summary(aov1)
##                 Df Sum Sq Mean Sq F value   Pr(>F)    
## Age.d.           1 15.153  15.153   39.52 4.57e-08 ***
## STIMULUS         2 10.518   5.259   13.72 1.33e-05 ***
## Age.d.:STIMULUS  2  2.232   1.116    2.91   0.0625 .  
## Residuals       58 22.238   0.383                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 17 observations deleted due to missingness
# IN CYCLES, Anova
aov2 <- aov(log(FirstHatch.cycles.) ~ Age.d.*STIMULUS, data=data)
summary(aov2)
##                 Df Sum Sq Mean Sq F value   Pr(>F)    
## Age.d.           1  18.99  18.993   49.54 2.53e-09 ***
## STIMULUS         2  55.13  27.565   71.89  < 2e-16 ***
## Age.d.:STIMULUS  2   2.23   1.116    2.91   0.0625 .  
## Residuals       58  22.24   0.383                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 17 observations deleted due to missingness

Latency varied among stimuli and decreased with age, with a marginally non-significant interaction effect (Fig. 6B,C). This pattern held whether measuring latency in seconds or in cycles of vibration (seconds: age, F1,58 = 39.52, P < 0.0001; stimulus, F2,58 = 13.72, P < 0.0001; interaction, F2,58 = 2.91, P = 0.0625; cycles: age, F1,58 = 49.54, P < 0.0001; stimulus, F2,58 = 71.89, P < 0.0001; interaction, F2,58 = 2.91, P = 0.0625).

Alternatively, we can use GLMs:

# IN SECONDS, GLM METHOD
glm1<-glm(FirstHatch.s.~Age.d., family=gaussian(link = "identity"), data=data)
glm2<-glm(FirstHatch.s.~STIMULUS, family=gaussian(link = "identity"), data=data)
glm3<-glm(FirstHatch.s.~Age.d.+STIMULUS, family=gaussian(link = "identity"), data=data)
glm4<-glm(FirstHatch.s.~Age.d.*STIMULUS, family=gaussian(link = "identity"), data=data)

glms<-list(glm1, glm2, glm3, glm4)
#aictab(glms)

Anova(glm4)
## Analysis of Deviance Table (Type II tests)
## 
## Response: FirstHatch.s.
##                 LR Chisq Df Pr(>Chisq)    
## Age.d.            25.396  1  4.669e-07 ***
## STIMULUS          14.836  2  0.0006002 ***
## Age.d.:STIMULUS   10.553  2  0.0051100 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we plot the latencies for the first embryo in each tray to hatch:

LtoH_younger_errorstats_g8<-
  younger %>%
  group_by(STIMULUS) %>%
  summarize(count = n(),
            mean = mean(FirstHatch.s., na.rm=T),
            SD = sd(FirstHatch.s., na.rm=T), 
            SE = sd(FirstHatch.s., na.rm=T)/sqrt(n())
            )
LtoH_younger_errorstats_g8$AgeGroup <- "5.2"
kable(LtoH_younger_errorstats_g8,title="Mean & SD & SE",digits=3)
STIMULUS count mean SD SE AgeGroup
HF 13 166.625 130.271 36.131 5.2
LF 13 39.455 33.554 9.306 5.2
LS 13 111.000 74.883 20.769 5.2
LtoH_older_errorstats_g8<-
  older %>%
  group_by(STIMULUS) %>%
  summarize(count = n(),
            mean = mean(FirstHatch.s., na.rm=T),
            SD = sd(FirstHatch.s., na.rm=T), 
            SE = sd(FirstHatch.s., na.rm=T)/sqrt(n())
            )
LtoH_older_errorstats_g8$AgeGroup <- "5.7"
kable(LtoH_older_errorstats_g8,title="Mean & SD & SE",digits=3)
STIMULUS count mean SD SE AgeGroup
HF 14 34.000 22.276 5.954 5.7
LF 14 18.643 7.023 1.877 5.7
LS 14 32.615 20.642 5.517 5.7

Now we can make the plot!

LtoH_combined_errorstats<- rbind(LtoH_younger_errorstats_g8, LtoH_older_errorstats_g8)

The following code creates our figure:

ggplot(data=LtoH_combined_errorstats, aes(x=AgeGroup, y=mean, colour=STIMULUS)) + 
  geom_point(size=3) +
  geom_errorbar(data=LtoH_combined_errorstats, aes(ymin=mean-SE, ymax=mean+SE), width=0.05)+
  theme_bw(20)+
  ylab("Latency to hatch (seconds)\n")+
  xlab("\n Age (d)")

The image of the figure from the original paper that I’m replicating is included in a folder called “img” within my repo and embedded here:

Specifically, we’ve replotted part B of this figure.

Recall that because risk in predator attacks probably accrues as a function of time, but information from temporal properties accrues as a function of cycles, we conducted analyses of latency measured both in time (seconds) and in cycles (dividing time by the cycle length of the stimulus).

## IN CYCLES, GLM METHOD
glm1<-glm(FirstHatch.cycles.~Age.d., family=gaussian(link = "identity"), data=data)
glm2<-glm(FirstHatch.cycles.~STIMULUS, family=gaussian(link = "identity"), data=data)
glm3<-glm(FirstHatch.cycles.~Age.d.+STIMULUS, family=gaussian(link = "identity"), data=data)
glm4<-glm(FirstHatch.cycles.~Age.d.*STIMULUS, family=gaussian(link = "identity"), data=data)

glms<-list(glm1, glm2, glm3, glm4)
#aictab(glms)

Anova(glm4)
## Analysis of Deviance Table (Type II tests)
## 
## Response: FirstHatch.cycles.
##                 LR Chisq Df Pr(>Chisq)    
## Age.d.            15.999  1  6.337e-05 ***
## STIMULUS          30.858  2  1.992e-07 ***
## Age.d.:STIMULUS   18.715  2  8.630e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This pattern held whether measuring latency in seconds or in cycles of vibration (seconds: age, F1,58 = 44.07, P < 0.0001; stimulus, F2,58 = 13.88, P < 0.0001; interaction, F2,58 = 2.91, P = 0.0625; cycles: age, F1,58 = 44.05, P < 0.0001; stimulus, F2,58 = 31.78, P < 0.0001; interaction, F2,58 = 2.91, P = 0.0625).

Here we plot the latencies for the first embryo in each tray to hatch IN CYCLES:

In order to do this, we’ll get the SEs using our summarySE function.

cycles_younger_errorstats_g8<-
  younger %>%
  group_by(STIMULUS) %>%
  summarize(count = n(),
            mean = mean(FirstHatch.cycles., na.rm=T),
            SD = sd(FirstHatch.cycles., na.rm=T), 
            SE = sd(FirstHatch.cycles., na.rm=T)/sqrt(n())
            )
cycles_younger_errorstats_g8$AgeGroup <- "5.2"
kable(cycles_younger_errorstats_g8,title="Mean & SD & SE",digits=3)
STIMULUS count mean SD SE AgeGroup
HF 13 83.312 65.136 18.065 5.2
LF 13 19.727 16.777 4.653 5.2
LS 13 6.529 4.405 1.222 5.2
cycles_older_errorstats_g8<-
  older %>%
  group_by(STIMULUS) %>%
  summarize(count = n(),
            mean = mean(FirstHatch.cycles., na.rm=T),
            SD = sd(FirstHatch.cycles., na.rm=T), 
            SE = sd(FirstHatch.cycles., na.rm=T)/sqrt(n())
            )
cycles_older_errorstats_g8$AgeGroup <- "5.7"
kable(cycles_older_errorstats_g8,title="Mean & SD & SE",digits=3)
STIMULUS count mean SD SE AgeGroup
HF 14 17.000 11.138 2.977 5.7
LF 14 9.321 3.512 0.939 5.7
LS 14 1.919 1.214 0.325 5.7

Now we can make the plot!

cycles_combined_errorstats<- rbind(cycles_younger_errorstats_g8, cycles_older_errorstats_g8)

The following code creates our figure:

ggplot(data=cycles_combined_errorstats, aes(x=AgeGroup, y=mean, colour=STIMULUS)) + 
  geom_point(size=3) +
  geom_errorbar(data=cycles_combined_errorstats, aes(ymin=mean-SE, ymax=mean+SE), width=0.05)+
  theme_bw(20)+
  ylab("Latency to hatch (cycles)\n")+
  xlab("\n Age (d)")

The image of the figure from the original paper that I’m replicating is included in a folder called “img” within my repo and embedded here:

Specifically, we’ve replotted part C of this figure.

HF<-subset(data, STIMULUS=="HF", na.rm=T)
LF<-subset(data, STIMULUS=="LF", na.rm=T)
LS<-subset(data, STIMULUS=="LS", na.rm=T)

HFyounger<-subset(HF, Age.d.==5.2, na.rm=T)
HFolder<-subset(HF, Age.d.==5.7, na.rm=T)
LFyounger<-subset(LF, Age.d.==5.2, na.rm=T)
LFolder<-subset(LF, Age.d.==5.7, na.rm=T)
LSyounger<-subset(LS, Age.d.==5.2, na.rm=T)
LSolder<-subset(LS, Age.d.==5.7, na.rm=T)

mean(LFyounger$FirstHatch.s., na.rm=T)
## [1] 39.45455
se(LFyounger$FirstHatch.s., na.rm=T)
## [1] 10.11692
mean(LFolder$FirstHatch.s., na.rm=T)
## [1] 18.64286
se(LFolder$FirstHatch.s., na.rm=T)
## [1] 1.877007
mean(LSyounger$FirstHatch.cycles., na.rm=T)
## [1] 6.529412
se(LSyounger$FirstHatch.cycles., na.rm=T)
## [1] 1.557356
mean(LSolder$FirstHatch.cycles., na.rm=T)
## [1] 1.918552
se(LSolder$FirstHatch.cycles., na.rm=T)
## [1] 0.3367673

The shortest latency times were for the LF stimulus (younger: 39.5 ± 10.1 s; older 18.6 ± 1.9 s), while latencies for the LS stimulus included the fewest cycles of the vibration pattern (younger: 6.5 ± 1.6; older: 1.9 ± 0.3 cycles).

We conducted analyses of latency both on the subset of trays from which at least one individual hatched and also on the full dataset, assigning a latency of 600 s (i.e., the full playback plus post-playback observation period) to trays in which no embryos hatched.

# IN SECONDS, Anova
aov1 <- aov(FirstHatch.s.600 ~ Age.d.*STIMULUS, data=data)
summary(aov1)
##                 Df  Sum Sq Mean Sq F value  Pr(>F)   
## Age.d.           1  497562  497562  11.521 0.00110 **
## STIMULUS         2  501036  250518   5.801 0.00455 **
## Age.d.:STIMULUS  2   51388   25694   0.595 0.55418   
## Residuals       75 3239146   43189                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# IN SECONDS, GLM METHOD
glm1<-glm(FirstHatch.s.600~Age.d., family=gaussian(link = "identity"), data=data)
glm2<-glm(FirstHatch.s.600~STIMULUS, family=gaussian(link = "identity"), data=data)
glm3<-glm(FirstHatch.s.600~Age.d.+STIMULUS, family=gaussian(link = "identity"), data=data)
glm4<-glm(FirstHatch.s.600~Age.d.*STIMULUS, family=gaussian(link = "identity"), data=data)

glms<-list(glm1, glm2, glm3, glm4)
#aictab(glms)

Anova(glm4)
## Analysis of Deviance Table (Type II tests)
## 
## Response: FirstHatch.s.600
##                 LR Chisq Df Pr(>Chisq)    
## Age.d.           11.5207  1  0.0006883 ***
## STIMULUS         11.6011  2  0.0030259 ** 
## Age.d.:STIMULUS   1.1899  2  0.5516027    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If we assign a latency of 600 s to trays in which no embryos hatched, age and stimulus effects remain strong and the interaction effect is much weaker (latency in seconds: age, F1,75 = 27.98, P < 0.0001; stimulus, F2,75 = 7.77, P = 0.0009; interaction F2,75 = 0.75, P = 0.47).

Discussion

Older embryos showed lower latency to hatch, indicating less cue sampling, and more hatching overall. Their similarly high responses to two of the stimuli suggest they ceased to discriminate using slow-to-assess properties as indicators of safety; however, they showed little hatching if either frequency spectrum or a fast temporal pattern allowed rapid assessment of low risk.

Conclusion

Developmental changes in behavior due to ontogenetic adaptation of decision processes are likely to be widespread. Vibration-cued hatching allows us to use the power of playback experiments to improve our understanding of the development of adaptive embryo behavior.

Temporal Patterns Playback Experiment

Code preparation

Read in data

Next, we want to read in the relevant datafile (from the data-reanalysis-assignment repo) and show a few lines of raw data in your output (e.g., using head()).

t <- curl("https://raw.githubusercontent.com/jamjulie/data-reanalysis-assignment/master/Temporal-clean.csv")
T_data <- read.csv(t, header = TRUE, sep = ",", stringsAsFactors = TRUE)
head(T_data)
##       Date Set Order LayingDate Age TofDay Complete SUhour SUmin SUsec
## 1 20160704   1     1   20160629   5   5dAM        Y      6     0    45
## 2 20160704   1     2   20160629   5   5dAM        Y      6    21    10
## 3 20160704   1     3   20160629   5   5dAM        Y      6    39    10
## 4 20160704   2     4   20160629   5   5dAM        Y      6    55    40
## 5 20160704   2     5   20160629   5   5dAM        Y      7    13    12
## 6 20160704   2     6   20160629   5   5dAM        Y      7    24    30
##   SUtimehr  SUagehr Clutch Tray ESU Stimulus MinTo30y30 TestEggs PBKhour
## 1 6.012500 126.0125   C218    1  15        s   3.529412        9       6
## 2 6.352778 126.3528   C218    2  14        f 100.000000       13       6
## 3 6.652778 126.6528   C220    1  10        c  30.000000        8       6
## 4 6.927778 126.9278   C219    4  15        f 100.000000       15       7
## 5 7.220000 127.2200   C219    3  15        c  30.000000       15       7
## 6 7.408333 127.4083   C219    2  15        s   3.529412       15       7
##   PBKmin PBKsec PBKtimehr PBKtimemin   PBKage FirstHhour FirstHmin
## 1      7      0  6.116667   367.0000 126.1167         NA        NA
## 2     27      0  6.450000   387.0000 126.4500          6        30
## 3     44     55  6.748611   404.9167 126.7486          6        45
## 4      2      0  7.033333   422.0000 127.0333          7         2
## 5     18     15  7.304167   438.2500 127.3042          7        18
## 6     30     20  7.505556   450.3333 127.5056         NA        NA
##   FirstHsec FirstHtimehr FirstHtimemin FirstHatch.s. FirstHatch.cycles.
## 1        NA           NA            NA            NA                 NA
## 2        31     6.508611      390.5167     3.5166667          351.66667
## 3         3     6.750833      405.0500     0.1333333            4.00000
## 4        52     7.047778      422.8667     0.8666667           86.66667
## 5        21     7.305833      438.3500     0.1000000            3.00000
## 6        NA           NA            NA            NA                 NA
##   FirstHatch.s.600 EPPBK EP3m      PropH DorsalIMG1 DorsalIMG2 DorsalIMG3
## 1      600.0000000     9    9 0.00000000       4838       4839       4840
## 2        3.5166667    12   12 0.07692308       4842       4844       4845
## 3        0.1333333     1    1 0.87500000       4846       4847       4850
## 4        0.8666667     9    9 0.40000000       4854       4855       4857
## 5        0.1000000     0    0 1.00000000       4858       4860       4861
## 6      600.0000000    15   15 0.00000000       4862       4866       4867
##   TadLength1 TadLength2 TadLength3
## 1      1.194      1.239      1.258
## 2      1.227      1.277      1.228
## 3      1.153      1.168      1.171
## 4      1.149      1.120      1.103
## 5      1.164      1.102      1.097
## 6      1.153      1.150      1.173

Results, part I

Looking at how many test eggs:

quantile(T_data$TestEggs, na.rm=T)
##   0%  25%  50%  75% 100% 
##    1    3    6   12   15

Now let’s look at the structure of our data, and make sure the variables are defined as we want them to be defined.

str(T_data)
## 'data.frame':    132 obs. of  41 variables:
##  $ Date              : int  20160704 20160704 20160704 20160704 20160704 20160704 20160704 20160704 20160704 20160704 ...
##  $ Set               : int  1 1 1 2 2 2 3 3 3 4 ...
##  $ Order             : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ LayingDate        : int  20160629 20160629 20160629 20160629 20160629 20160629 20160629 20160629 20160629 20160629 ...
##  $ Age               : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ TofDay            : Factor w/ 2 levels "5dAM","6dAM": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Complete          : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...
##  $ SUhour            : int  6 6 6 6 7 7 7 8 8 8 ...
##  $ SUmin             : int  0 21 39 55 13 24 42 1 16 33 ...
##  $ SUsec             : int  45 10 10 40 12 30 20 50 30 35 ...
##  $ SUtimehr          : num  6.01 6.35 6.65 6.93 7.22 ...
##  $ SUagehr           : num  126 126 127 127 127 ...
##  $ Clutch            : Factor w/ 86 levels "C218","C219",..: 1 1 3 2 2 2 6 5 5 4 ...
##  $ Tray              : int  1 2 1 4 3 2 1 1 2 1 ...
##  $ ESU               : int  15 14 10 15 15 15 15 13 13 14 ...
##  $ Stimulus          : Factor w/ 3 levels "c","f","s": 3 2 1 2 1 3 2 3 1 1 ...
##  $ MinTo30y30        : num  3.53 100 30 100 30 ...
##  $ TestEggs          : int  9 13 8 15 15 15 13 12 10 14 ...
##  $ PBKhour           : int  6 6 6 7 7 7 7 8 8 8 ...
##  $ PBKmin            : int  7 27 44 2 18 30 49 6 21 39 ...
##  $ PBKsec            : int  0 0 55 0 15 20 0 45 30 0 ...
##  $ PBKtimehr         : num  6.12 6.45 6.75 7.03 7.3 ...
##  $ PBKtimemin        : num  367 387 405 422 438 ...
##  $ PBKage            : num  126 126 127 127 127 ...
##  $ FirstHhour        : int  NA 6 6 7 7 NA 7 8 8 8 ...
##  $ FirstHmin         : int  NA 30 45 2 18 NA 49 7 21 39 ...
##  $ FirstHsec         : int  NA 31 3 52 21 NA 10 15 43 24 ...
##  $ FirstHtimehr      : num  NA 6.51 6.75 7.05 7.31 ...
##  $ FirstHtimemin     : num  NA 391 405 423 438 ...
##  $ FirstHatch.s.     : num  NA 3.517 0.133 0.867 0.1 ...
##  $ FirstHatch.cycles.: num  NA 351.7 4 86.7 3 ...
##  $ FirstHatch.s.600  : num  600 3.517 0.133 0.867 0.1 ...
##  $ EPPBK             : int  9 12 1 9 0 15 12 9 3 10 ...
##  $ EP3m              : int  9 12 1 9 0 15 12 9 3 10 ...
##  $ PropH             : num  0 0.0769 0.875 0.4 1 ...
##  $ DorsalIMG1        : Factor w/ 132 levels "4838","4842",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ DorsalIMG2        : int  4839 4844 4847 4855 4860 4866 4871 4878 4882 4891 ...
##  $ DorsalIMG3        : int  4840 4845 4850 4857 4861 4867 4873 4879 4886 4892 ...
##  $ TadLength1        : num  1.19 1.23 1.15 1.15 1.16 ...
##  $ TadLength2        : num  1.24 1.28 1.17 1.12 1.1 ...
##  $ TadLength3        : num  1.26 1.23 1.17 1.1 1.1 ...

Here let’s subset the data into 2 age groups: younger and older.

T_data$MeanTadLength <- ave(T_data$TadLength1, T_data$TadLength2, T_data$TadLength3)
T_data$AlreadyHatchedorRuptured <- 15 - T_data$ESU
T_data$HatchedinSetup <- T_data$ESU - T_data$TestEggs
T_data$NumHatch <- T_data$TestEggs - T_data$EP3m
  
T_younger<-subset(T_data, TofDay=="5dAM", na.rm=T)
T_older<-subset(T_data, TofDay=="6dAM", na.rm=T)

Summarize the data

Here we’ll create a table summary of statistics for hatching rates (in proportion hatched) for each age category, grouped by stimulus treatment. We use the {dplyr} package to summarize our data.

T_younger_errorstats<-
  T_younger %>%
  group_by(Stimulus) %>%
  summarize(count = n(),
            mean = mean(PropH),
            SD = sd(PropH),
            SE = sd(PropH)/sqrt(n())
            )
T_younger_errorstats$AgeGroup <- "younger"
kable(T_younger_errorstats,title="Mean & SD & SE",digits=3)
Stimulus count mean SD SE AgeGroup
c 14 0.592 0.353 0.094 younger
f 14 0.128 0.157 0.042 younger
s 14 0.084 0.122 0.033 younger
T_older_errorstats<-
  T_older %>%
  group_by(Stimulus) %>%
  summarize(count = n(),
            mean = mean(PropH),
            SD = sd(PropH),
            SE = sd(PropH)/sqrt(n())
            )
T_older_errorstats$AgeGroup <- "older"
kable(T_older_errorstats,title="Mean & SD & SE",digits=3)
Stimulus count mean SD SE AgeGroup
c 30 0.861 0.164 0.030 older
f 30 0.519 0.312 0.057 older
s 30 0.796 0.192 0.035 older

Should we use parametric or non-parametric tests?

hist(T_data$MeanTadLength) #non parametric

hist(T_younger$MeanTadLength)

hist(T_older$MeanTadLength)

We’ll use mann-whitney-wilcoxon test because our 2 data samples (younger and older) are independent/come from distinct populations, so the samples do not affect each other.

wtest<-wilcox.test(T_data$MeanTadLength~T_data$TofDay, mu = 0, alt="two.sided", paired = F, conf.int=T, conf.level=0.95)
qnorm(wtest$p.value) #z-value to report
## [1] -7.566642
wtest
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  T_data$MeanTadLength by T_data$TofDay
## W = 322.5, p-value = 1.915e-14
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -0.11600925 -0.07797926
## sample estimates:
## difference in location 
##            -0.09797561

Younger embryos showed a range of tadpole lengths (Wilcoxon test, Z=, P =).

Moving right along, here we’ll calculate the total lengths (mean and se) of hatchlings in each age group.

mean(T_younger$MeanTadLength)
## [1] 1.141905
mean(T_older$MeanTadLength)
## [1] 1.238856
se(T_younger$MeanTadLength)
## [1] 0.007930286
se(T_older$MeanTadLength)
## [1] 0.005936841

Embryos grew between test ages. Hatchling total length increased from 11.1 ± 0.07 to 11.6 ± 0.05 mm (mean ± SE, N = 39 and 42 trays respectively).

Since the data are normal and we get equal variances, we’ll perform a t-test.

hist(T_data$MeanTadLength)#normal

var.test(T_younger$MeanTadLength, T_older$MeanTadLength) # equal variances
## 
##  F test to compare two variances
## 
## data:  T_younger$MeanTadLength and T_older$MeanTadLength
## F = 0.83267, num df = 41, denom df = 89, p-value = 0.5207
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.5035275 1.4506392
## sample estimates:
## ratio of variances 
##          0.8326714
t.test(T_younger$MeanTadLength, T_older$MeanTadLength, alternative=c("two.sided"), paired=FALSE, var.equal=TRUE, conf.level=0.95)
## 
##  Two Sample t-test
## 
## data:  T_younger$MeanTadLength and T_older$MeanTadLength
## t = -9.4647, df = 130, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.11721617 -0.07668542
## sample estimates:
## mean of x mean of y 
##  1.141905  1.238856

Embryos grew between test ages. Hatchling total length increased from ___ ± ___ to ___ ± ___ mm (mean ± SE, N = ___ and ___ trays respectively; t-test, t___ = , P =; Fig. ___).

Embryos also hatched spontaneously between test ages, so we want to calculate some means and SEs for spontaneous hatching before the test period.

mean(T_younger$AlreadyHatchedorRuptured)
## [1] 1.261905
se(T_younger$AlreadyHatchedorRuptured)
## [1] 0.2210195
mean(T_older$AlreadyHatchedorRuptured)
## [1] 7.211111
se(T_older$AlreadyHatchedorRuptured)
## [1] 0.3563214

From trays tested, more of the older embryos had hatched spontaneously before the test period (____ ± ____ vs. ____ ± ____ embryos per tray).

Since the data are non-parametric, we’ll perform a mann-whitney test:

hist(T_data$AlreadyHatchedorRuptured)#non-parametric, mann-whitney

hist(log(T_data$AlreadyHatchedorRuptured))

wtest<-wilcox.test(T_data$AlreadyHatchedorRuptured~T_data$TofDay, mu = 0, alt="two.sided", paired = F, conf.int=T, conf.level=0.95)
qnorm(wtest$p.value) #z-value to report
## [1] -7.765664
wtest
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  T_data$AlreadyHatchedorRuptured by T_data$TofDay
## W = 290, p-value = 4.061e-15
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -7.000016 -5.000004
## sample estimates:
## difference in location 
##              -6.000054

From trays tested, more of the older embryos had hatched spontaneously before the test period (___ ± ___ vs. ___ ± ___ embryos per tray; Wilcoxon test, Z=, P =) and relatively fewer of the trays with older eggs had sufficient individuals to attempt setup (KMW unquantified personal observation).

Now we’ll look at the number of embryos that hatched at set up:

# hatching from set up
mean(T_younger$HatchedinSetup)
## [1] 0.6666667
se(T_younger$HatchedinSetup)
## [1] 0.1726566
mean(T_older$HatchedinSetup)
## [1] 3.133333
se(T_older$HatchedinSetup)
## [1] 0.2691829

More of the older embryos hatched during setup and acclimation (___ ± ___ vs. ___ ± ___ per tray).

Since the data are non-parametric, we’ll perform a mann-whitney test:

hist(T_data$HatchedinSetup)#non-parametric, mann-whitney

wtest<-wilcox.test(T_data$HatchedinSetup~T_data$TofDay, mu = 0, alt="two.sided", paired = F, conf.int=T, conf.level=0.95)
qnorm(wtest$p.value) #z-value to report
## [1] -6.383105
wtest
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  T_data$HatchedinSetup by T_data$TofDay
## W = 586, p-value = 8.677e-11
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -2.999939 -1.000030
## sample estimates:
## difference in location 
##              -1.999999

More of the older embryos hatched during setup and acclimation (___ ± ___ vs. ___ ± ___ per tray; Wilcoxon test, Z=, P = ).

We want to calculate some means and ses for the number of test eggs per tray in older vs. younger.

#smaller number of test eggs per tray in older than younger
mean(T_younger$TestEggs)
## [1] 13.07143
se(T_younger$TestEggs)
## [1] 0.2978687
mean(T_older$TestEggs)
## [1] 4.655556
se(T_older$TestEggs)
## [1] 0.2592417
min(T_younger$TestEggs)
## [1] 8
max(T_younger$TestEggs)
## [1] 15
min(T_older$TestEggs)
## [1] 1
max(T_older$TestEggs)
## [1] 13

This resulted in smaller numbers of test eggs per tray (older 11.5 ± 0.3, younger 13.5 ± 0.2, range 8–15 at both ages).

Since the data are non-parametric, we’ll perform a mann-whitney test:

hist(T_data$TestEggs)#non-parametric, mann-whitney

wtest<-wilcox.test(T_data$TestEggs~T_data$TofDay, mu = 0, alt="two.sided", paired = F, conf.int=T, conf.level=0.95)
qnorm(wtest$p.value) #z-value to report
## [1] -8.959937
wtest
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  T_data$TestEggs by T_data$TofDay
## W = 3731, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  8.000090 9.999987
## sample estimates:
## difference in location 
##               8.999986

This resulted in smaller numbers of test eggs per tray (Wilcoxon test, Z=, P =-6; older ± , younger ± , range – at both ages).

Results, part II: Visualizations

Next, we want to know whether age, stimulus, and/or their interaction affected the hatching response of embryos in playbacks

We’ll do this by performing a binomial GLM:

#binomial glm
glm<-glm(cbind(NumHatch,TestEggs)~TofDay*Stimulus, family=binomial(logit), data=T_data)
Anova(glm)
## Analysis of Deviance Table (Type II tests)
## 
## Response: cbind(NumHatch, TestEggs)
##                 LR Chisq Df Pr(>Chisq)    
## TofDay            75.795  1  < 2.2e-16 ***
## Stimulus          46.653  2  7.405e-11 ***
## TofDay:Stimulus   36.788  2  1.027e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Age, stimulus, and their interaction affected the hatching response of embryos in playbacks (Fig. 6A). Older embryos hatched more than did younger ones (binomial GLM, age effect: χ2 = , df = , P ), and the LF stimulus elicited the strongest hatching response (stimulus effect: χ2 = , df = , P ). However, the significant age × stimulus interaction revealed that the developmental increase in hatching was not uniform across stimuli (interaction effect: χ2 = , df = , P = ).

This is the plot for our proportion hatched data (Figure 6 in paper)

interaction.plot(T_data$TofDay, T_data$Stimulus, T_data$PropH,
                 leg.bty="0", pch=c(18,24),
                 ylab="Proportion hatched",
                 xlab="Age group",
                 trace.label="")

ALTERNATIVELY, we can use ggplot to create a much prettier plot, with standard errors.

T_younger_errorstats$AgeGroup<-as.factor(T_younger_errorstats$AgeGroup)
T_older_errorstats$AgeGroup<-as.factor(T_older_errorstats$AgeGroup)
T_combined_errorstats<- rbind(T_younger_errorstats, T_older_errorstats)

The following code creates our figure:

ggplot(data=T_combined_errorstats, aes(x=AgeGroup, y=mean, colour=Stimulus)) +
  geom_point(size=3) +
  geom_errorbar(data=T_combined_errorstats, aes(ymin=mean-SE, ymax=mean+SE), width=0.05)+
  theme_bw(20)+
  ylab("Proportion of tray hatched\n")+
  xlab("\n Age (d)")

  #annotate("text", x = 1, y = 0.2, label = "N=13")+
  #annotate("text", x = 2, y = 0.4, label = "N=14")

Moving on: Here we perform separate binomial GLMs at each age, with grouped categories of treatment stimuli.

#binomial glm for YOUNGER
glm2<-glm(cbind(NumHatch,TestEggs)~Stimulus, family=binomial(logit), data=T_younger)
Anova(glm2)
## Analysis of Deviance Table (Type II tests)
## 
## Response: cbind(NumHatch, TestEggs)
##          LR Chisq Df Pr(>Chisq)    
## Stimulus   75.439  2  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#binomial glm for OLDER
glm2<-glm(cbind(NumHatch,TestEggs)~Stimulus, family=binomial(logit), data=T_older)
Anova(glm2)
## Analysis of Deviance Table (Type II tests)
## 
## Response: cbind(NumHatch, TestEggs)
##          LR Chisq Df Pr(>Chisq)  
## Stimulus   8.0014  2     0.0183 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Younger embryos showed equally little response to both the HF and LS stimuli but a substantial hatching response to the LF stimulus (Fig. , stimulus effect: χ2 = , df = , P < ; HF–LS contrast χ2 = P = ; (HF+LS)–LF contrast χ2 = , P <). In contrast, older embryos showed similarly strong hatching responses to both LF and LS stimuli and a weaker response to the HF stimulus (stimulus effect: χ2 = , df = , P < ; LF–LS contrast χ2 = , P =; (LF+LS)–HF contrast χ2 = , P <).

Results, part III: Latency Analysis

Because risk in predator attacks probably accrues as a function of time, but information from temporal properties accrues as a function of cycles we conducted analyses of latency measured both in time (seconds) and in cycles (dividing time by the cycle length of the stimulus).

min(T_data$FirstHatch.s., na.rm=T)
## [1] 0.05
max(T_data$FirstHatch.s., na.rm=T)
## [1] 4.166667

In trays where hatching occurred, the latency until the first embryo hatched ranged from 9–420 s.

Are our data parametric?

hist(T_data$FirstHatch.s.) #nonparametric

hist(T_data$FirstHatch.cycles.) #nonparametric

hist(log(T_data$FirstHatch.s.)) #parametric!

hist(log(T_data$FirstHatch.cycles.)) #parametric!

Since they are, we can use ANOVAs of log-transformed data to test for effects of age, stimulus and age-by-stimulus interaction on the latency to hatch.

# IN SECONDS, Anova
aov1 <- aov(log(FirstHatch.s.) ~ TofDay*Stimulus, data=T_data)
summary(aov1)
##                  Df Sum Sq Mean Sq F value   Pr(>F)    
## TofDay            1   1.09   1.090   1.665    0.200    
## Stimulus          2  22.38  11.190  17.095 3.61e-07 ***
## TofDay:Stimulus   2   0.85   0.426   0.650    0.524    
## Residuals       107  70.04   0.655                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 19 observations deleted due to missingness
# IN CYCLES, Anova
aov2 <- aov(log(FirstHatch.cycles.) ~ TofDay*Stimulus, data=T_data)
summary(aov2)
##                  Df Sum Sq Mean Sq F value Pr(>F)    
## TofDay            1   3.24    3.24   4.955 0.0281 *  
## Stimulus          2 204.69  102.35 156.354 <2e-16 ***
## TofDay:Stimulus   2   0.85    0.43   0.650 0.5240    
## Residuals       107  70.04    0.65                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 19 observations deleted due to missingness

Latency varied among stimuli and decreased with age, with a marginally non-significant interaction effect (Fig. ). This pattern held whether measuring latency in seconds or in cycles of vibration (seconds: age, F, = , P < ; stimulus, F, = , P <; interaction, F, = , P = ; cycles: age, F, = , P < ; stimulus, F, = , P < ; interaction, F, = , P = ).

Alternatively, we can use GLMs:

# IN SECONDS, GLM METHOD
glm4<-glm(FirstHatch.s.~TofDay*Stimulus, family=gaussian(link = "identity"), data=T_data)
Anova(glm4)
## Analysis of Deviance Table (Type II tests)
## 
## Response: FirstHatch.s.
##                 LR Chisq Df Pr(>Chisq)    
## TofDay            6.5532  1    0.01047 *  
## Stimulus         24.8408  2  4.035e-06 ***
## TofDay:Stimulus   3.4810  2    0.17544    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we plot the latencies for the first embryo in each tray to hatch:

T_LtoH_younger_errorstats<-
  T_younger %>%
  group_by(Stimulus) %>%
  summarize(count = n(),
            mean = mean(FirstHatch.s., na.rm=T),
            SD = sd(FirstHatch.s., na.rm=T),
            SE = sd(FirstHatch.s., na.rm=T)/sqrt(n())
            )
T_LtoH_younger_errorstats$AgeGroup <- "younger"
kable(T_LtoH_younger_errorstats,title="Mean & SD & SE",digits=3)
Stimulus count mean SD SE AgeGroup
c 14 0.413 0.275 0.074 younger
f 14 1.590 1.535 0.410 younger
s 14 1.717 1.489 0.398 younger
T_LtoH_older_errorstats<-
  T_older %>%
  group_by(Stimulus) %>%
  summarize(count = n(),
            mean = mean(FirstHatch.s., na.rm=T),
            SD = sd(FirstHatch.s., na.rm=T),
            SE = sd(FirstHatch.s., na.rm=T)/sqrt(n())
            )
T_LtoH_older_errorstats$AgeGroup <- "older"
kable(T_LtoH_older_errorstats,title="Mean & SD & SE",digits=3)
Stimulus count mean SD SE AgeGroup
c 30 0.319 0.146 0.027 older
f 30 1.022 0.950 0.173 older
s 30 0.858 0.747 0.136 older

Now we can make the plot!

T_LtoH_combined_errorstats<- rbind(T_LtoH_younger_errorstats, T_LtoH_older_errorstats)

The following code creates our figure:

ggplot(data=T_LtoH_combined_errorstats, aes(x=AgeGroup, y=mean, colour=Stimulus)) +
  geom_point(size=3) +
  geom_errorbar(data=T_LtoH_combined_errorstats, aes(ymin=mean-SE, ymax=mean+SE), width=0.05)+
  theme_bw(20)+
  ylab("Latency to hatch (seconds)\n")+
  xlab("\n Age group")

The image of the figure from the original paper that I’m replicating is included in a folder called “img” within my repo and embedded here:

Specifically, we’ve replotted part B of this figure.

Recall that because risk in predator attacks probably accrues as a function of time, but information from temporal properties accrues as a function of cycles, we conducted analyses of latency measured both in time (seconds) and in cycles (dividing time by the cycle length of the stimulus).

## IN CYCLES, GLM METHOD
glm4<-glm(FirstHatch.cycles.~TofDay*Stimulus, family=gaussian(link = "identity"), data=T_data)
Anova(glm4)
## Analysis of Deviance Table (Type II tests)
## 
## Response: FirstHatch.cycles.
##                 LR Chisq Df Pr(>Chisq)    
## TofDay             1.951  1     0.1625    
## Stimulus          77.972  2     <2e-16 ***
## TofDay:Stimulus    3.538  2     0.1705    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This pattern held whether measuring latency in seconds or in cycles of vibration (seconds: age, F, = , P < ; stimulus, F, = , P < ; interaction, F, = , P = ; cycles: age, F, = , P < ; stimulus, F, = , P < ; interaction, F, = , P = ).

Here we plot the latencies for the first embryo in each tray to hatch IN CYCLES:

In order to do this, we’ll get the SEs using our summarySE function.

T_cycles_younger_errorstats<-
  T_younger %>%
  group_by(Stimulus) %>%
  summarize(count = n(),
            mean = mean(FirstHatch.cycles., na.rm=T),
            SD = sd(FirstHatch.cycles., na.rm=T),
            SE = sd(FirstHatch.cycles., na.rm=T)/sqrt(n())
            )
T_cycles_younger_errorstats$AgeGroup <- "younger"
kable(T_cycles_younger_errorstats,title="Mean & SD & SE",digits=3)
Stimulus count mean SD SE AgeGroup
c 14 12.393 8.261 2.208 younger
f 14 158.958 153.468 41.016 younger
s 14 6.059 5.256 1.405 younger
T_cycles_older_errorstats<-
  T_older %>%
  group_by(Stimulus) %>%
  summarize(count = n(),
            mean = mean(FirstHatch.cycles., na.rm=T),
            SD = sd(FirstHatch.cycles., na.rm=T),
            SE = sd(FirstHatch.cycles., na.rm=T)/sqrt(n())
            )
T_cycles_older_errorstats$AgeGroup <- "older"
kable(T_cycles_older_errorstats,title="Mean & SD & SE",digits=3)
Stimulus count mean SD SE AgeGroup
c 30 9.571 4.382 0.800 older
f 30 102.200 94.979 17.341 older
s 30 3.027 2.635 0.481 older

Now we can make the plot!

T_cycles_combined_errorstats<- rbind(T_cycles_younger_errorstats, T_cycles_older_errorstats)

The following code creates our figure:

ggplot(data=T_cycles_combined_errorstats, aes(x=AgeGroup, y=mean, colour=Stimulus)) +
  geom_point(size=3) +
  geom_errorbar(data=T_cycles_combined_errorstats, aes(ymin=mean-SE, ymax=mean+SE), width=0.05)+
  theme_bw(20)+
  ylab("Latency to hatch (cycles)\n")+
  xlab("\n Age (d)")

slow<-subset(T_data, Stimulus=="s", na.rm=T)
control<-subset(T_data, Stimulus=="c", na.rm=T)
fast<-subset(T_data, Stimulus=="f", na.rm=T)

slowyounger<-subset(slow, TofDay=="5dAM", na.rm=T)
slowolder<-subset(slow, TofDay=="6dAM", na.rm=T)
controlyounger<-subset(control, TofDay=="5dAM", na.rm=T)
contrololder<-subset(control, TofDay=="6dAM", na.rm=T)
fastyounger<-subset(fast, TofDay=="5dAM", na.rm=T)
fastolder<-subset(fast, TofDay=="6dAM", na.rm=T)

mean(controlyounger$FirstHatch.s., na.rm=T)
## [1] 0.4130952
se(controlyounger$FirstHatch.s., na.rm=T)
## [1] 0.07359134
mean(contrololder$FirstHatch.s., na.rm=T)
## [1] 0.3190476
se(contrololder$FirstHatch.s., na.rm=T)
## [1] 0.02760148
mean(slowyounger$FirstHatch.cycles., na.rm=T)
## [1] 6.058824
se(slowyounger$FirstHatch.cycles., na.rm=T)
## [1] 1.858169
mean(slowolder$FirstHatch.cycles., na.rm=T)
## [1] 3.027451
se(slowolder$FirstHatch.cycles., na.rm=T)
## [1] 0.4810511

The shortest latency times were for the control stimulus (younger: 0.41 ± 0.07 s; older 0.32 ± 0.03 s), while latencies for the LS stimulus included the fewest cycles of the vibration pattern (younger: 6.06 ± 1.86; older: 3.03 ± 0.48 cycles).

We conducted analyses of latency both on the subset of trays from which at least one individual hatched and also on the full dataset, assigning a latency of 600 s (i.e., the full playback plus post-playback observation period) to trays in which no embryos hatched.

# IN SECONDS, Anova
aov1 <- aov(FirstHatch.s.600 ~ TofDay*Stimulus, data=T_data)
summary(aov1)
##                  Df  Sum Sq Mean Sq F value   Pr(>F)    
## TofDay            1  446444  446444  12.172 0.000669 ***
## Stimulus          2  335508  167754   4.574 0.012089 *  
## TofDay:Stimulus   2  436399  218200   5.949 0.003397 ** 
## Residuals       126 4621239   36676                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# IN SECONDS, GLM METHOD
glm4<-glm(FirstHatch.s.600~TofDay*Stimulus, family=gaussian(link = "identity"), data=T_data)
Anova(glm4)
## Analysis of Deviance Table (Type II tests)
## 
## Response: FirstHatch.s.600
##                 LR Chisq Df Pr(>Chisq)    
## TofDay           12.1725  1   0.000485 ***
## Stimulus          9.1478  2   0.010318 *  
## TofDay:Stimulus  11.8986  2   0.002608 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If we assign a latency of 600 s to trays in which no embryos hatched, age and stimulus effects remain strong and the interaction effect is much weaker (latency in seconds: age, F, = , P < ; stimulus, F, = , P = ; interaction F, =, P = ).